Can we do better than frequency? A case study on extracting PP-verb collocations
نویسندگان
چکیده
We argue that lexical association measures (AMs) should be evaluated against a reference set of collocations manually extracted from the full candidate data, and that the notion of collocation needs to be precisely defined so that human collocativity judgments and experimental results are reproducible. We show that identification results achieved by particular AMs do not crucially depend on text type, but that some AMs are much better suited for identifying some classes of collocations than others.
منابع مشابه
Acquisition of Phraseological Units from Linguistically Interpreted Corpora a Case Study on German Pp-verb Collocations
In this paper, we show that accessibility of syntactic information eases collocation extraction from corpora, and supports identi cation of lexical and structural restrictions related to collocations. For collocation identi cation we use a corpus that is automatically annotated applying a part-of-speech tagger and a phrase chunker.
متن کاملFalse Paraphrase Pairs in Spanish for Verbs and Verb+Noun Collocations
In this paper we have studied some pairs of paraphrases which are present in a linguistic resource called badele.3000, a data base that contains more than 3,600 high frequency Spanish nouns and 2,800 high frequency Spanish verbs. The restricted combinatory of both kinds of words means more that 23,000 collocations, which are expressed by Lexical Functions, a tool of Meaning-Text Theory. Through...
متن کاملIssues in defining/extracting collocations in Japanese and Korean: Empirical implications for building a collocation database
Collocations in Japanese and Korean have been studied extensively based on statistical tools. The criteria for collocations in these languages, however, have not been fully established in the literature, and it is not obvious whether all statistically significant combinations of words could be regarded as collocations. In this article, we point out empirical problems in extracting collocations ...
متن کاملExperiments on Candidate Data for Collocation Extraction
The paper describes ongoing work on the evaluation of methods for extracting collocation candidates from large text corpora. Our research is based on a German treebank corpus used as gold standard. Results are available for adjective+noun pairs, which proved to be a comparatively easy extraction task. We plan to extend the evaluation to other types of collocations (e.g., PP+verb pairs).
متن کاملA Corpus-Based Study on High Frequency Verb Collocations in the Case of “HAVE”
[Abstract] On the basis of a corpus-driven approach, this research investigates high-frequency verb collocations in the case of have by Chinese non-English major learners. Results show that despite the most frequent use of the verb have, the learners make use of relatively low collocation types. The learners tend to simply overuse the words related to the topic or given by the writing direction...
متن کامل